{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Evaluating OpenCL Caffe caching mechanisms"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Table of Contents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. [Overview](#overview)\n",
    "1. [Experimental analysis code](#code) [for developers]\n",
    "1. [Mali-T628](#mali_t628)\n",
    "  1. [Original caching mechanism](#mali_t628_original)\n",
    "  1. [Proposed caching mechanism](#mali_t628_proposed)\n",
    "  1. [Compare the proposed mechanism vs the original mechanism](#mali_t628_compare)\n",
    "1. [GTX 1080](#gtx_1080)\n",
    "  1. [Original caching mechanism](#gtx_1080_original)\n",
    "  1. [Proposed caching mechanism](#gtx_1080_proposed)\n",
    "  1. [Compare the proposed mechanism vs the original mechanism](#gtx_1080_compare)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"overview\"></a>\n",
    "## Overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This Jupyter notebook studies the performance (speed) of OpenCL API build and compile calls for the Caffe framework built with the ViennaCL library using two mechanisms:\n",
    "- [`original`] the current ViennaCL caching mechanism, in which the program binary is cached after the `clBuildProgram()` call but before the `clCreateKernelsInProgram()` call;\n",
    "- [`proposed`] the new caching mechanism proposed by [dividiti](http://dividiti.com), in which the program binary is cached after both the `clBuildProgram()` and `clCreateKernelsInProgram()` calls;\n",
    "\n",
    "on two experimental platforms:\n",
    "- [[Mali-T628](#mali_t628)] ARM Mali-T628 GPU in the Odroid-XU3 development platform (GPU driver v12.0);\n",
    "- [[GTX 1080](#gtx_1080)] NVIDIA GTX 1080 GPU installed in a HP 640 workstation (GPU driver v375.26)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our experiments show that ARM's OpenCL implementation only compiles kernels when the user invokes the `clCreateKernelsInProgram()` API call; therefore, on the Mali-T628 platform, the original mechanism is ineffective; the proposed mechanism accelerates the OpenCL Caffe initialisation time by over 50 times on subsequent invocations.\n",
    "\n",
    "Our experiments also show that NVIDIA's OpenCL implementation compiles kernels when the user invokes the `clBuildProgram()` API call; therefore, on the GTX 1080 platform, both the original and the proposed mechanisms perform similarly."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Common experimental setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The experiments were performed using the [Collective Knowledge](http://cknowledge.org) framework for reproducible and collaborative R&amp;D using the following preparatory steps on each of the platforms."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For simplicity, only one instance of Caffe, one instance of Caffe model and one instance of [dividiti's OpenCL profiler](https://github.com/dividiti/dvdt-prof) were registered on the platform at the time e.g.:\n",
    "```\n",
    "$ ck show env --tags=caffemodel\n",
    "Env UID:         Target OS: Bits: Name:                                         Version: Tags:\n",
    "e8811419b9c1149c   linux-32    32 Caffe model (net and weights) (bvlc, alexnet) trunk    32bits,alexnet,bvlc,caffe,caffemodel,host-os-linux-32,mirror,net,target-os-linux-32,v0,weights\n",
    "$ ck show env --tags=lib,caffe     \n",
    "Env UID:         Target OS: Bits: Name:                                  Version:       Tags:\n",
    "53834f239eff6c18   linux-32    32 BVLC Caffe framework (opencl,viennacl) master-69f35c5 32bits,bvlc,caffe,host-os-linux-32,lib,target-os-linux-32,v0,v0.0,vmaster,vopencl\n",
    "$ ck show env --tags=dvdt,prof\n",
    "Env UID:         Target OS: Bits: Name:                                  Version: Tags:\n",
    "2ea881239f688658   linux-32    32 dividiti's OpenCL API profiler (cjson) trunk    32bits,cjson,dividiti,dvdt,host-os-linux-32,opencl,prof,profiler,target-os-linux-32,tool,v0,vtrunk\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To start with, clean previous installations and experiments, and install ViennaCL source-only packages:\n",
    "1. Clean all previous OpenCL Caffe caching mechanism experiments:\n",
    "```\n",
    "$ ck rm local:experiment:original*cache* --force\n",
    "$ ck rm local:experiment:proposed*cache* --force\n",
    "```\n",
    "\n",
    "1. Clean all previous installations of Caffe with ViennaCL and ViennaCL sources:\n",
    "```\n",
    "$ ck clean env --tags=lib,caffe,vviennacl\n",
    "$ ck clean env --tags=lib,viennacl,vsrc\n",
    "```\n",
    "\n",
    "1. Install [ViennaCL's master](https://github.com/viennacl/viennacl-dev) (**original**) and [dividiti's fork](https://github.com/dividiti/viennacl-dev) (**proposed**):\n",
    "```\n",
    "$ ck install package:lib-viennacl-master-src\n",
    "$ ck install package:lib-viennacl-dvdt-src\n",
    "```\n",
    "\n",
    "First, perform the **original** set of experiments:\n",
    "1. Install Caffe with ViennaCL, while selecting [ViennaCL's master](https://github.com/viennacl/viennacl-dev):\n",
    "  1. On the GTX 1080 platform:\n",
    "```\n",
    "$ ck install package:lib-caffe-bvlc-opencl-viennacl-universal\n",
    "``` \n",
    "Also, on this platform, `ck-caffe:program:caffe` sometimes fails after reporting the benchmarking results; to perform the requested number of repetitions even if some of them fail, change `\"ignore_return_code\":\"no\"` to `\"ignore_return_code\":\"yes\"` (for the `time_gpu` command in `program/caffe/.cm/meta.json`).\n",
    "\n",
    "  1. On the Mali-T628 platform:\n",
    "```\n",
    "$ ck install package:lib-caffe-bvlc-opencl-viennacl-universal \\\n",
    "  --env.DISABLE_DOUBLE_SUPPORT=ON \\\n",
    "  --env.DISABLE_DEVICE_HOST_UNIFIED_MEMORY=ON \\\n",
    "  --env.CK_HOST_CPU_NUMBER_OF_PROCESSORS=3\n",
    "```\n",
    "1. Run the experiments as detailed below ([Mali-T628](#mali_t628_original) or [GTX 1080](#gtx_1080_original)).\n",
    "\n",
    "1. Remove Caffe with ViennaCL (***necessary*** before performing the **proposed** set of experiments):\n",
    "```\n",
    "$ ck clean env --tags=lib,caffe,vviennacl\n",
    "```\n",
    "\n",
    "Then, perform the **proposed** set of experiments:\n",
    "1. Install Caffe with ViennaCL, while selecting [dividiti's fork](https://github.com/dividiti/viennacl-dev) e.g.\n",
    "```\n",
    "$ ck install package:lib-caffe-bvlc-opencl-viennacl-universal\n",
    "```\n",
    "\n",
    "1. Run the experiments as detailed below ([Mali-T628](#mali_t628_proposed) or [GTX 1080](#gtx_1080_proposed)).\n",
    "1. Remove Caffe with ViennaCL (***optional***):\n",
    "```\n",
    "$ ck clean env --tags=lib,caffe,vviennacl\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"code\"></a>\n",
    "## Data wrangling code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**NB:** Please ignore this section if you are not interested in re-running or modifying this notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Includes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Standard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "import json\n",
    "import re"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Date util "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import dateutil.parser"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Scientific"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If some of the scientific packages are missing, please install them using:\n",
    "```\n",
    "# pip install jupyter pandas numpy matplotlib\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import IPython as ip\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib as mp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print ('IPython version: %s' % ip.__version__)\n",
    "print ('Pandas version: %s' % pd.__version__)\n",
    "print ('NumPy version: %s' % np.__version__)\n",
    "print ('Matplotlib version: %s' % mp.__version__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "from matplotlib import cm\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from IPython.display import Image\n",
    "from IPython.core.display import HTML"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Collective Knowledge"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If CK is not installed, please install it using:\n",
    "```\n",
    "# pip install ck\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import ck.kernel as ck\n",
    "print ('CK version: %s' % ck.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create, build and compile OpenCL API calls"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# OpenCL API calls to create program.\n",
    "create_calls = [ 'clCreateProgramWithSource', 'clCreateProgramWithBinary' ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# OpenCL API calls to build program.\n",
    "build_calls = [ 'clBuildProgram' ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# OpenCL API calls to compile kernels.\n",
    "compile_calls = [ 'clCreateKernel', 'clCreateKernelsInProgram' ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# All OpenCL API calls to create program, build program and compile kernels.\n",
    "create_build_compile_calls = create_calls + build_calls + compile_calls"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Calculate time elapsed between two ISO timestamps"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Return the difference between the end and start timestamps in seconds.\n",
    "def ts_delta_s(ts_end, ts_start):\n",
    "    delta = dateutil.parser.parse(ts_end) - dateutil.parser.parse(ts_start)\n",
    "    delta_s = delta.total_seconds()\n",
    "    return delta_s\n",
    "\n",
    "# Return the difference between the end and start timestamps in milliseconds.\n",
    "def ts_delta_ms(ts_end, ts_start):\n",
    "    delta_s = ts_delta_s(ts_end, ts_start)\n",
    "    delta_ms = delta_s * 1e3\n",
    "    return delta_ms"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Access the results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def get_results(repo_uoa, common_tags):\n",
    "    module_uoa = 'experiment'\n",
    "    r = ck.access({'action':'search', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'tags':common_tags})\n",
    "    if r['return']>0:\n",
    "        print (\"Error: %s\" % r['error'])\n",
    "        exit(1)\n",
    "    experiments = r['lst']\n",
    "    \n",
    "    experiment_dfs = []\n",
    "    for experiment in experiments:\n",
    "        data_uoa = experiment['data_uoa']\n",
    "        r = ck.access({'action':'list_points', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'data_uoa':data_uoa})\n",
    "        if r['return']>0:\n",
    "            print (\"Error: %s\" % r['error'])\n",
    "            exit(1)\n",
    "\n",
    "        unique_tags = ','.join([ tag for tag in r['dict']['tags'] if tag not in common_tags])\n",
    "        point_dfs = []\n",
    "        for point in r['points']:\n",
    "            point_file_path = os.path.join(r['path'], 'ckp-%s.0001.json' % point)\n",
    "            with open(point_file_path) as point_file:\n",
    "                print ('Reading: %s...' % point_file_path)\n",
    "                point_data_raw = json.load(point_file)\n",
    "            # Traces for all repetitions of this point.\n",
    "            trace_list = [\n",
    "                characteristics['run'].get('dvdt_prof',[]) for characteristics in point_data_raw['characteristics_list']\n",
    "            ]                \n",
    "            # All OpenCL API calls to create program, build program and compile kernels.\n",
    "            create_build_compile_dfs = []\n",
    "            for trace in trace_list:\n",
    "                # Only include the first repetition of the 'cache-cold' experiments\n",
    "                # (as the subsequent ones are in fact 'cache-warm').\n",
    "                if (unique_tags=='cache-cold' or unique_tags=='cuda-cache-cold') and create_build_compile_dfs: continue\n",
    "                create_build_compile_trace = [\n",
    "                    { 'call' : call['call'], 'time_ms': ts_delta_ms(call['timestamp']['end'], call['timestamp']['start']) }\n",
    "                    for call in trace if call['call'] in create_build_compile_calls \n",
    "                ]\n",
    "                create_build_compile_df = pd.DataFrame(create_build_compile_trace).set_index(['call'], append=True)\n",
    "                create_build_compile_df.index.names = [ 'id', 'call' ]\n",
    "                create_build_compile_dfs.append(create_build_compile_df)\n",
    "            # Aggregate all calls.\n",
    "            point_df = pd.concat(create_build_compile_dfs, axis=1)\n",
    "            point_dfs.append(point_df)\n",
    "        # Aggregate all points.\n",
    "        experiment_df = pd.concat(point_dfs)\n",
    "        experiment_df.columns = [ [unique_tags]*len(experiment_df.columns), range(len(experiment_df.columns)) ]\n",
    "        experiment_df.columns.names = [ 'experiment', 'repetition' ]\n",
    "        experiment_dfs.append(experiment_df)\n",
    "    # Aggregate all experiments.\n",
    "    result_df = pd.concat(experiment_dfs, axis=1)\n",
    "    # Convert to preferred format (unify clCreateProgram* calls, repetitions as columns, replace missing data with zeros).\n",
    "    result_df = result_df \\\n",
    "        .rename(index={'clCreateProgramWithBinary':'clCreateProgram*', 'clCreateProgramWithSource':'clCreateProgram*'}) \\\n",
    "        .stack('experiment')\n",
    "    return result_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Show the results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def show_results(results):\n",
    "    pd.options.display.max_columns = len(results.columns)\n",
    "    pd.options.display.max_rows = len(results.index)\n",
    "    return results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plot the results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def plot_results(results, title='Execution time (ms)', rot=0):\n",
    "    mean = results.mean(axis=1).unstack('experiment')\n",
    "    std  = results.std(axis=1).unstack('experiment')\n",
    "    ymax = mean.max().max()\n",
    "    mean.plot(yerr=std, kind='bar', title=title,\n",
    "        rot=rot, figsize=[16, 8], ylim=[0,ymax*1.05],\n",
    "        grid=True, legend=True, colormap=cm.autumn\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compare the results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def compare_results(original, proposed, experiment, call):\n",
    "    def cumulative_per_experiment_per_call(results, experiment, call):\n",
    "        return results \\\n",
    "            .reorder_levels(['experiment', 'call', 'id']) \\\n",
    "            .loc[experiment] \\\n",
    "            .loc[call] \\\n",
    "            .mean(axis=1).sum()\n",
    "    \n",
    "    original_per_experiment_per_call = cumulative_per_experiment_per_call(original, experiment, call)\n",
    "    proposed_per_experiment_per_call = cumulative_per_experiment_per_call(proposed, experiment, call)\n",
    "    print ('[%s] all %s() calls w/ original: %.1f (ms)' % (experiment, call, original_per_experiment_per_call))\n",
    "    print ('[%s] all %s() calls w/ proposed: %.1f (ms)' % (experiment, call, proposed_per_experiment_per_call))\n",
    "    \n",
    "    proposed_vs_original_pc = \\\n",
    "        100.0 * (proposed_per_experiment_per_call-original_per_experiment_per_call) / original_per_experiment_per_call\n",
    "    print ('[%s] all %s() calls (proposed-original)/original: %.1f%%' % (experiment, call, proposed_vs_original_pc))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"mali_t628\"></a>\n",
    "## Mali-T628"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"mali_t628_original\"></a>\n",
    "### Mali-T628 - original caching mechanism"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Experimental setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The experimental data were collected on the Mali-T628 experimental platform as follows:\n",
    "```\n",
    "$ export VIENNACL_CACHE_DIR=/tmp/viennacl-cache/ && rm -rf $VIENNACL_CACHE_DIR && mkdir $VIENNACL_CACHE_DIR\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_repo=local --record_uoa=original-cache-none \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=3 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 \\\n",
    "  --tags=caffe,opencl,build,compile,original,mali-t628,cache-none\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_repo=local --record_uoa=original-cache-cold \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=1 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 --env.VIENNACL_CACHE_PATH=$VIENNACL_CACHE_DIR \\\n",
    "  --tags=caffe,opencl,build,compile,original,mali-t628,cache-cold\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_repo=local --record_uoa=original-cache-warm \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=3 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 --env.VIENNACL_CACHE_PATH=$VIENNACL_CACHE_DIR \\\n",
    "  --tags=caffe,opencl,build,compile,original,mali-t628,cache-warm\n",
    "``` \n",
    "\n",
    "The experimental data were archived as follows:\n",
    "```\n",
    "$ ck zip local:experiment:original-cache* \\\n",
    "  --archive_name=ck-caffe-opencl-build-compile-original-mali-t628.zip\n",
    "```\n",
    "\n",
    "The resulting archive was copied to another machine and prepared for analysis as follows:\n",
    "```\n",
    "$ ck add repo:ck-caffe-opencl-build-compile-original-mali-t628 \\\n",
    "  --zip=ck-caffe-opencl-build-compile-original-mali-t628.zip --quiet\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Experimental analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "repo_uoa = 'ck-caffe-opencl-build-compile-original-mali-t628'\n",
    "common_tags = 'caffe,opencl,build,compile,original,mali-t628'\n",
    "mali_t628_original = get_results(repo_uoa, common_tags)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "show_results(mali_t628_original)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_results(mali_t628_original, rot=90)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `clCreateProgram*()` calls only take considerable time for the 'cache-warm' experiment (i.e. `clCreateProgramWithBinary()`), while the `clBuildProgram()` calls only take considerable time for the 'cache-none' and 'cache-cold' experiments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_results(mali_t628_original \\\n",
    "             .reorder_levels(['call', 'id', 'experiment']) \\\n",
    "             .ix[build_calls] \\\n",
    "             .reorder_levels(['id', 'call', 'experiment']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The execution time of the `clCreateKernelsInProgram()` calls, however, is practically the same whether using the original ViennaCL caching mechanism or not, which suggests it's simply ineffective on this platform."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_results(mali_t628_original \\\n",
    "             .reorder_levels(['call', 'id', 'experiment']) \\\n",
    "             .ix[compile_calls] \\\n",
    "             .reorder_levels(['id', 'call', 'experiment']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"mali_t628_proposed\"></a>\n",
    "### Mali-T628 - proposed caching mechanism"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Experimental setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The experimental data were collected on the Mali-T628 experimental platform as follows:\n",
    "```\n",
    "$ export VIENNACL_CACHE_DIR=/tmp/viennacl-cache/ && rm -rf $VIENNACL_CACHE_DIR && mkdir $VIENNACL_CACHE_DIR\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_repo=local --record_uoa=proposed-cache-none \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=3 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 \\\n",
    "  --tags=caffe,opencl,build,compile,proposed,mali-t628,cache-none\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_repo=local --record_uoa=proposed-cache-cold \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=1 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 --env.VIENNACL_CACHE_PATH=$VIENNACL_CACHE_DIR \\\n",
    "  --tags=caffe,opencl,build,compile,proposed,mali-t628,cache-cold\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_repo=local --record_uoa=proposed-cache-warm \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=3 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 --env.VIENNACL_CACHE_PATH=$VIENNACL_CACHE_DIR \\\n",
    "  --tags=caffe,opencl,build,compile,proposed,mali-t628,cache-warm\n",
    "``` \n",
    "\n",
    "The experimental data were archived as follows:\n",
    "```\n",
    "$ ck zip local:experiment:proposed-cache* \\\n",
    "  --archive_name=ck-caffe-opencl-build-compile-proposed-mali-t628.zip\n",
    "```\n",
    "\n",
    "The resulting archive was copied to another machine and prepared for analysis as follows:\n",
    "```\n",
    "$ ck add repo:ck-caffe-opencl-build-compile-proposed-mali-t628 \\\n",
    "  --zip=ck-caffe-opencl-build-compile-proposed-mali-t628.zip --quiet\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Experimental analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "repo_uoa = 'ck-caffe-opencl-build-compile-proposed-mali-t628'\n",
    "common_tags = 'caffe,opencl,build,compile,proposed,mali-t628'\n",
    "mali_t628_proposed = get_results(repo_uoa, common_tags)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "show_results(mali_t628_proposed)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_results(mali_t628_proposed, rot=90)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_results(mali_t628_proposed \\\n",
    "             .reorder_levels(['call', 'id', 'experiment']) \\\n",
    "             .ix[build_calls] \\\n",
    "             .reorder_levels(['id', 'call', 'experiment']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_results(mali_t628_proposed \\\n",
    "             .reorder_levels(['call', 'id', 'experiment']) \\\n",
    "             .ix[compile_calls] \\\n",
    "             .reorder_levels(['id', 'call', 'experiment']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"mali_t628_compare\"></a>\n",
    "### Mali-T628 - compare the original mechanism vs the proposed mechanism"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "compare_results(mali_t628_original, mali_t628_proposed, 'cache-warm', 'clCreateKernelsInProgram')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "compare_results(mali_t628_original, mali_t628_proposed, 'cache-warm', 'clBuildProgram')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"gtx_1080\"></a>\n",
    "## GTX 1080"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"gtx_1080_original\"></a>\n",
    "### GTX 1080 - original caching mechanism"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Experimental setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The experimental data were collected on the GTX 1080 experimental platform as follows:\n",
    "```\n",
    "$ export CUDA_CACHE_DIR=$HOME/.nv/ComputeCache/ && rm -rf $CUDA_CACHE_DIR\n",
    "$ export VIENNACL_CACHE_DIR=/tmp/viennacl-cache/ && rm -rf $VIENNACL_CACHE_DIR && mkdir $VIENNACL_CACHE_DIR\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_failed \\\n",
    "  --record_repo=local --record_uoa=original-cuda-cache-cold \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=1 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 \\\n",
    "  --tags=caffe,opencl,build,compile,original,gtx-1080,cuda-cache-cold\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_failed \\\n",
    "  --record_repo=local --record_uoa=original-cuda-cache-warm \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=3 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 \\\n",
    "  --tags=caffe,opencl,build,compile,original,gtx-1080,cuda-cache-warm\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_failed \\\n",
    "  --record_repo=local --record_uoa=original-cache-none \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=3 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 --env.CUDA_CACHE_DISABLE=1 \\\n",
    "  --tags=caffe,opencl,build,compile,original,gtx-1080,cache-none\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_failed \\\n",
    "  --record_repo=local --record_uoa=original-cache-cold \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=1 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 --env.CUDA_CACHE_DISABLE=1 --env.VIENNACL_CACHE_PATH=$VIENNACL_CACHE_DIR \\\n",
    "  --tags=caffe,opencl,build,compile,original,gtx-1080,cache-cold\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_failed \\\n",
    "  --record_repo=local --record_uoa=original-cache-warm \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=3 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 --env.CUDA_CACHE_DISABLE=1 --env.VIENNACL_CACHE_PATH=$VIENNACL_CACHE_DIR \\\n",
    "  --tags=caffe,opencl,build,compile,original,gtx-1080,cache-warm\n",
    "```\n",
    "\n",
    "The experimental data were archived as follows:\n",
    "```\n",
    "$ ck zip local:experiment:original*cache* \\\n",
    "  --archive_name=ck-caffe-opencl-build-compile-original-gtx-1080.zip\n",
    "```\n",
    "\n",
    "The resulting archive was copied to another machine and prepared for analysis as follows:\n",
    "```\n",
    "$ ck add repo:ck-caffe-opencl-build-compile-original-gtx-1080 \\\n",
    "  --zip=ck-caffe-opencl-build-compile-original-gtx-1080.zip --quiet\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Experimental analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "repo_uoa = 'ck-caffe-opencl-build-compile-original-gtx-1080'\n",
    "common_tags = 'caffe,opencl,build,compile,original,gtx-1080'\n",
    "gtx_1080_original = get_results(repo_uoa, common_tags)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "show_results(gtx_1080_original)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_results(gtx_1080_original, rot=90)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_results(gtx_1080_original \\\n",
    "             .reorder_levels(['call', 'id', 'experiment']) \\\n",
    "             .ix[build_calls] \\\n",
    "             .reorder_levels(['id', 'call', 'experiment']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_results(gtx_1080_original \\\n",
    "             .reorder_levels(['call', 'id', 'experiment']) \\\n",
    "             .ix[compile_calls] \\\n",
    "             .reorder_levels(['id', 'call', 'experiment']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"gtx_1080_proposed\"></a>\n",
    "### GTX 1080 - proposed caching mechanism"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Experimental setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The experimental data were collected on the GTX 1080 experimental platform as follows:\n",
    "```\n",
    "$ export CUDA_CACHE_DIR=$HOME/.nv/ComputeCache/ && rm -rf $CUDA_CACHE_DIR\n",
    "$ export VIENNACL_CACHE_DIR=/tmp/viennacl-cache/ && rm -rf $VIENNACL_CACHE_DIR && mkdir $VIENNACL_CACHE_DIR\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_failed \\\n",
    "  --record_repo=local --record_uoa=proposed-cuda-cache-cold \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=1 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 \\\n",
    "  --tags=caffe,opencl,build,compile,proposed,gtx-1080,cuda-cache-cold\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_failed \\\n",
    "  --record_repo=local --record_uoa=proposed-cuda-cache-warm \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=3 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 \\\n",
    "  --tags=caffe,opencl,build,compile,proposed,gtx-1080,cuda-cache-warm\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_failed \\\n",
    "  --record_repo=local --record_uoa=proposed-cache-none \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=3 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 --env.CUDA_CACHE_DISABLE=1 \\\n",
    "  --tags=caffe,opencl,build,compile,proposed,gtx-1080,cache-none\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_failed \\\n",
    "  --record_repo=local --record_uoa=proposed-cache-cold \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=1 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 --env.CUDA_CACHE_DISABLE=1 --env.VIENNACL_CACHE_PATH=$VIENNACL_CACHE_DIR \\\n",
    "  --tags=caffe,opencl,build,compile,proposed,gtx-1080,cache-cold\n",
    "$ ck benchmark program:caffe \\\n",
    "  --record --record_failed \\\n",
    "  --record_repo=local --record_uoa=proposed-cache-warm \\\n",
    "  --dvdt_prof --skip_stat_analysis \\\n",
    "  --cmd_key=time_gpu --cpu_freq=max --repetitions=3 \\\n",
    "  --env.CK_CAFFE_BATCH_SIZE=1 --env.CUDA_CACHE_DISABLE=1 --env.VIENNACL_CACHE_PATH=$VIENNACL_CACHE_DIR \\\n",
    "  --tags=caffe,opencl,build,compile,proposed,gtx-1080,cache-warm\n",
    "```\n",
    "\n",
    "The experimental data were archived as follows:\n",
    "```\n",
    "$ ck zip local:experiment:proposed*cache* \\\n",
    "  --archive_name=ck-caffe-opencl-build-compile-proposed-gtx-1080.zip\n",
    "```\n",
    "\n",
    "The resulting archive was copied to another machine and prepared for analysis as follows:\n",
    "```\n",
    "$ ck add repo:ck-caffe-opencl-build-compile-proposed-gtx-1080 \\\n",
    "  --zip=ck-caffe-opencl-build-compile-proposed-gtx-1080.zip --quiet\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Experimental analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "repo_uoa = 'ck-caffe-opencl-build-compile-proposed-gtx-1080'\n",
    "common_tags = 'caffe,opencl,build,compile,proposed,gtx-1080'\n",
    "gtx_1080_proposed = get_results(repo_uoa, common_tags)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "show_results(gtx_1080_proposed)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_results(gtx_1080_proposed, rot=90)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_results(gtx_1080_proposed \\\n",
    "             .reorder_levels(['call', 'id', 'experiment']) \\\n",
    "             .ix[build_calls] \\\n",
    "             .reorder_levels(['id', 'call', 'experiment']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plot_results(gtx_1080_proposed \\\n",
    "             .reorder_levels(['call', 'id', 'experiment']) \\\n",
    "             .ix[compile_calls] \\\n",
    "             .reorder_levels(['id', 'call', 'experiment']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"gtx_1080_compare\"></a>\n",
    "### GTX 1080 - compare the original mechanism vs the proposed mechanism"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "compare_results(gtx_1080_original, gtx_1080_proposed, 'cache-warm', 'clBuildProgram')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "compare_results(gtx_1080_original, gtx_1080_proposed, 'cache-warm', 'clCreateKernelsInProgram')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
